Goto

Collaborating Authors

 entropy-regularized markov decision process



Planning in entropy-regularized Markov decision processes and games

Neural Information Processing Systems

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.



Reviews: Planning in entropy-regularized Markov decision processes and games

Neural Information Processing Systems

This theoretical paper considers the problem of computing optimal value function in entropy-regularized MDPs and two-player games. It shows that the smoothness property of the Bellman operator in the presence of entropy regularized policies (and possibly other forms of regularization), can be used to derive a sample complexity which is polynomial of order O((1/ε) {4 c}), with c being a problem independent constant and ε the precision of the value function estimate. The proof is built upon the proposed algorithm, SmoothCruiser, an algorithm motivated in the sparse sampling algorithm of Kearns et al that recursively estimates V through samples and subsequently aggregates the results. This sampling dynamic programming is done up to a depth when the required number of samples is no longer polynomial. The paper is very well written and provides a solid result.


Reviews: Planning in entropy-regularized Markov decision processes and games

Neural Information Processing Systems

The reviewers were in consensus that this is an interesting and well written paper with a significant theoretical contribution. While empirical results should not be strictly required for a paper that is strong theoretically, they would nonetheless greatly improve the paper, and thus the authors are strongly encouraged to include them in the final version, even if they are relegated to supplementary material.


Planning in entropy-regularized Markov decision processes and games

Neural Information Processing Systems

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order \tilde{\mathcal{O}}(1/\epsilon 4) for a desired accuracy \epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.


Planning in entropy-regularized Markov decision processes and games

Grill, Jean-Bastien, Domingues, Omar Darwiche, Menard, Pierre, Munos, Remi, Valko, Michal

Neural Information Processing Systems

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon 4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case. Papers published at the Neural Information Processing Systems Conference.